Exploratory Data Analysis¶

In [3]:
# Libraries for data manipulation
import  pandas as pd
import numpy as np

# Libraries for visualization
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
from plotly.subplots import make_subplots
In [4]:
# Load the Coffee Dataset
df = pd.read_csv(r"C:\Users\nanav\Downloads\Coffee Dataset.csv")

Data Overview¶

Here, we'll first look at the Coffee dataset. We'll check the first and last few rows to understand what the data looks like.¶
In [552]:
df.head(10)
Out[552]:
Submission ID What is your age? How many cups of coffee do you typically drink per day? Where do you typically drink coffee? Where do you typically drink coffee? (At home) Where do you typically drink coffee? (At the office) Where do you typically drink coffee? (On the go) Where do you typically drink coffee? (At a cafe) Where do you typically drink coffee? (None of these) How do you brew coffee at home? ... What is the most you'd ever be willing to pay for a cup of coffee? Do you feel like you’re getting good value for your money when you buy coffee at a cafe? Approximately how much have you spent on coffee equipment in the past 5 years? Do you feel like you’re getting good value for your money with regards to your coffee equipment? Gender Education Level Ethnicity/Race Employment Status Number of Children Political Affiliation
0 gMR29l 18-24 years old NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 BkPN0e 25-34 years old NaN NaN NaN NaN NaN NaN NaN Pod/capsule machine (e.g. Keurig/Nespresso) ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 W5G8jj 25-34 years old NaN NaN NaN NaN NaN NaN NaN Bean-to-cup machine ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 4xWgGr 35-44 years old NaN NaN NaN NaN NaN NaN NaN Coffee brewing machine (e.g. Mr. Coffee) ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 QD27Q8 25-34 years old NaN NaN NaN NaN NaN NaN NaN Pour over ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 V0LPeM 55-64 years old NaN NaN NaN NaN NaN NaN NaN Pod/capsule machine (e.g. Keurig/Nespresso), E... ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6 V0Gaxg 18-24 years old NaN At a cafe, At the office, At home, On the go True True True True False Pour over, French press, Espresso, Instant cof... ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7 AdzRL0 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
8 LbWda2 25-34 years old Less than 1 At a cafe False False False True False NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
9 EXQLWN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

10 rows × 111 columns

Basic Dataset information¶

In [556]:
# Shape of the dataframe
df.shape
Out[556]:
(4042, 111)
In [558]:
# Name of each column in dataframe
df.columns.tolist()
Out[558]:
['Submission ID',
 'What is your age?',
 'How many cups of coffee do you typically drink per day?',
 'Where do you typically drink coffee?',
 'Where do you typically drink coffee? (At home)',
 'Where do you typically drink coffee? (At the office)',
 'Where do you typically drink coffee? (On the go)',
 'Where do you typically drink coffee? (At a cafe)',
 'Where do you typically drink coffee? (None of these)',
 'How do you brew coffee at home?',
 'How do you brew coffee at home? (Pour over)',
 'How do you brew coffee at home? (French press)',
 'How do you brew coffee at home? (Espresso)',
 'How do you brew coffee at home? (Coffee brewing machine (e.g. Mr. Coffee))',
 'How do you brew coffee at home? (Pod/capsule machine (e.g. Keurig/Nespresso))',
 'How do you brew coffee at home? (Instant coffee)',
 'How do you brew coffee at home? (Bean-to-cup machine)',
 'How do you brew coffee at home? (Cold brew)',
 'How do you brew coffee at home? (Coffee extract (e.g. Cometeer))',
 'How do you brew coffee at home? (Other)',
 'How else do you brew coffee at home?',
 'On the go, where do you typically purchase coffee?',
 'On the go, where do you typically purchase coffee? (National chain (e.g. Starbucks, Dunkin))',
 'On the go, where do you typically purchase coffee? (Local cafe)',
 'On the go, where do you typically purchase coffee? (Drive-thru)',
 'On the go, where do you typically purchase coffee? (Specialty coffee shop)',
 'On the go, where do you typically purchase coffee? (Deli or supermarket)',
 'On the go, where do you typically purchase coffee? (Other)',
 'Where else do you purchase coffee?',
 'What is your favorite coffee drink?',
 'Please specify what your favorite coffee drink is',
 'Do you usually add anything to your coffee?',
 'Do you usually add anything to your coffee? (No - just black)',
 'Do you usually add anything to your coffee? (Milk, dairy alternative, or coffee creamer)',
 'Do you usually add anything to your coffee? (Sugar or sweetener)',
 'Do you usually add anything to your coffee? (Flavor syrup)',
 'Do you usually add anything to your coffee? (Other)',
 'What else do you add to your coffee?',
 'What kind of dairy do you add?',
 'What kind of dairy do you add? (Whole milk)',
 'What kind of dairy do you add? (Skim milk)',
 'What kind of dairy do you add? (Half and half)',
 'What kind of dairy do you add? (Coffee creamer)',
 'What kind of dairy do you add? (Flavored coffee creamer)',
 'What kind of dairy do you add? (Oat milk)',
 'What kind of dairy do you add? (Almond milk)',
 'What kind of dairy do you add? (Soy milk)',
 'What kind of dairy do you add? (Other)',
 'What kind of sugar or sweetener do you add?',
 'What kind of sugar or sweetener do you add? (Granulated Sugar)',
 'What kind of sugar or sweetener do you add? (Artificial Sweeteners (e.g., Splenda))',
 'What kind of sugar or sweetener do you add? (Honey)',
 'What kind of sugar or sweetener do you add? (Maple Syrup)',
 'What kind of sugar or sweetener do you add? (Stevia)',
 'What kind of sugar or sweetener do you add? (Agave Nectar)',
 'What kind of sugar or sweetener do you add? (Brown Sugar)',
 'What kind of sugar or sweetener do you add? (Raw Sugar (Turbinado))',
 'What kind of flavorings do you add?',
 'What kind of flavorings do you add? (Vanilla Syrup)',
 'What kind of flavorings do you add? (Caramel Syrup)',
 'What kind of flavorings do you add? (Hazelnut Syrup)',
 'What kind of flavorings do you add? (Cinnamon (Ground or Stick))',
 'What kind of flavorings do you add? (Peppermint Syrup)',
 'What kind of flavorings do you add? (Other)',
 'What other flavoring do you use?',
 "Before today's tasting, which of the following best described what kind of coffee you like?",
 'How strong do you like your coffee?',
 'What roast level of coffee do you prefer?',
 'How much caffeine do you like in your coffee?',
 'Lastly, how would you rate your own coffee expertise?',
 'Coffee A - Bitterness',
 'Coffee A - Acidity',
 'Coffee A - Personal Preference',
 'Coffee A - Notes',
 'Coffee B - Bitterness',
 'Coffee B - Acidity',
 'Coffee B - Personal Preference',
 'Coffee B - Notes',
 'Coffee C - Bitterness',
 'Coffee C - Acidity',
 'Coffee C - Personal Preference',
 'Coffee C - Notes',
 'Coffee D - Bitterness',
 'Coffee D - Acidity',
 'Coffee D - Personal Preference',
 'Coffee D - Notes',
 'Between Coffee A, Coffee B, and Coffee C which did you prefer?',
 'Between Coffee A and Coffee D, which did you prefer?',
 'Lastly, what was your favorite overall coffee?',
 'Do you work from home or in person?',
 'In total, much money do you typically spend on coffee in a month?',
 'Why do you drink coffee?',
 'Why do you drink coffee? (It tastes good)',
 'Why do you drink coffee? (I need the caffeine)',
 'Why do you drink coffee? (I need the ritual)',
 'Why do you drink coffee? (It makes me go to the bathroom)',
 'Why do you drink coffee? (Other)',
 'Other reason for drinking coffee',
 'Do you like the taste of coffee?',
 'Do you know where your coffee comes from?',
 "What is the most you've ever paid for a cup of coffee?",
 "What is the most you'd ever be willing to pay for a cup of coffee?",
 'Do you feel like you’re getting good value for your money when you buy coffee at a cafe?',
 'Approximately how much have you spent on coffee equipment in the past 5 years?',
 'Do you feel like you’re getting good value for your money with regards to your coffee equipment?',
 'Gender',
 'Education Level',
 'Ethnicity/Race',
 'Employment Status',
 'Number of Children',
 'Political Affiliation']
In [560]:
# Datatype of each column in dataframe
df.dtypes
Out[560]:
Submission ID                                              object
What is your age?                                          object
How many cups of coffee do you typically drink per day?    object
Where do you typically drink coffee?                       object
Where do you typically drink coffee? (At home)             object
                                                            ...  
Education Level                                            object
Ethnicity/Race                                             object
Employment Status                                          object
Number of Children                                         object
Political Affiliation                                      object
Length: 111, dtype: object
In [562]:
df.isnull().any()
Out[562]:
Submission ID                                              False
What is your age?                                           True
How many cups of coffee do you typically drink per day?     True
Where do you typically drink coffee?                        True
Where do you typically drink coffee? (At home)              True
                                                           ...  
Education Level                                             True
Ethnicity/Race                                              True
Employment Status                                           True
Number of Children                                          True
Political Affiliation                                       True
Length: 111, dtype: bool
In [564]:
df.describe()
Out[564]:
What kind of flavorings do you add? What kind of flavorings do you add? (Vanilla Syrup) What kind of flavorings do you add? (Caramel Syrup) What kind of flavorings do you add? (Hazelnut Syrup) What kind of flavorings do you add? (Cinnamon (Ground or Stick)) What kind of flavorings do you add? (Peppermint Syrup) What kind of flavorings do you add? (Other) What other flavoring do you use? Lastly, how would you rate your own coffee expertise? Coffee A - Bitterness ... Coffee A - Personal Preference Coffee B - Bitterness Coffee B - Acidity Coffee B - Personal Preference Coffee C - Bitterness Coffee C - Acidity Coffee C - Personal Preference Coffee D - Bitterness Coffee D - Acidity Coffee D - Personal Preference
count 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3938.000000 3798.000000 ... 3789.000000 3780.000000 3767.000000 3773.000000 3764.000000 3751.000000 3766.000000 3767.000000 3765.000000 3764.000000
mean NaN NaN NaN NaN NaN NaN NaN NaN 5.693499 2.141127 ... 3.310900 3.013228 2.223786 3.068646 3.071998 2.366836 3.064790 2.162729 3.858167 3.375930
std NaN NaN NaN NaN NaN NaN NaN NaN 1.948867 0.947163 ... 1.185953 0.992875 0.865389 1.113546 0.999267 0.921048 1.128431 1.081546 1.007973 1.452504
min NaN NaN NaN NaN NaN NaN NaN NaN 1.000000 1.000000 ... 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
25% NaN NaN NaN NaN NaN NaN NaN NaN 5.000000 1.000000 ... 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 1.000000 3.000000 2.000000
50% NaN NaN NaN NaN NaN NaN NaN NaN 6.000000 2.000000 ... 3.000000 3.000000 2.000000 3.000000 3.000000 2.000000 3.000000 2.000000 4.000000 4.000000
75% NaN NaN NaN NaN NaN NaN NaN NaN 7.000000 3.000000 ... 4.000000 4.000000 3.000000 4.000000 4.000000 3.000000 4.000000 3.000000 5.000000 5.000000
max NaN NaN NaN NaN NaN NaN NaN NaN 10.000000 5.000000 ... 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000

8 rows × 21 columns

Data Cleaning¶

In [7]:
# Dropping columns with more than 70% NaN values
threshold = len(df) * 0.7
df.dropna(thresh=threshold, axis=1)
Out[7]:
Submission ID What is your age? How many cups of coffee do you typically drink per day? Where do you typically drink coffee? Where do you typically drink coffee? (At home) Where do you typically drink coffee? (At the office) Where do you typically drink coffee? (On the go) Where do you typically drink coffee? (At a cafe) Where do you typically drink coffee? (None of these) How do you brew coffee at home? ... What is the most you've ever paid for a cup of coffee? What is the most you'd ever be willing to pay for a cup of coffee? Do you feel like you’re getting good value for your money when you buy coffee at a cafe? Approximately how much have you spent on coffee equipment in the past 5 years? Do you feel like you’re getting good value for your money with regards to your coffee equipment? Gender Education Level Ethnicity/Race Employment Status Political Affiliation
0 gMR29l 18-24 years old NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 BkPN0e 25-34 years old NaN NaN NaN NaN NaN NaN NaN Pod/capsule machine (e.g. Keurig/Nespresso) ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 W5G8jj 25-34 years old NaN NaN NaN NaN NaN NaN NaN Bean-to-cup machine ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 4xWgGr 35-44 years old NaN NaN NaN NaN NaN NaN NaN Coffee brewing machine (e.g. Mr. Coffee) ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 QD27Q8 25-34 years old NaN NaN NaN NaN NaN NaN NaN Pour over ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4037 PA44VP >65 years old 2 At home True False False False False Coffee brewing machine (e.g. Mr. Coffee) ... $6-$8 $4-$6 No Less than $20 Yes Female Master's degree White/Caucasian Retired Democrat
4038 vNgpPD >65 years old 2 At home True False False False False Coffee brewing machine (e.g. Mr. Coffee) ... $4-$6 $2-$4 No Less than $20 Yes Male Bachelor's degree White/Caucasian Retired Republican
4039 g5ggRM 18-24 years old 1 At a cafe, At home, On the go, At the office True True True True False Espresso, Pod/capsule machine (e.g. Keurig/Nes... ... $8-$10 More than $20 Yes $300-$500 Yes Male Some college or associate's degree White/Caucasian Employed full-time Democrat
4040 rlgbDN 25-34 years old 2 At home True False False False False Pour over ... $4-$6 $8-$10 Yes $100-$300 Yes Male Bachelor's degree White/Caucasian Unemployed Democrat
4041 0EGYe9 25-34 years old 1 At home True False False False False Pour over, French press, Espresso, Other ... $15-$20 $15-$20 Yes $500-$1000 Yes Female Doctorate or professional degree White/Caucasian Employed full-time Democrat

4042 rows × 67 columns

In [9]:
# Step 2: Drop rows with too many NaN values (more than 50%)
df = df.dropna(thresh=df.shape[1] * 0.5, axis=0)
In [215]:
# Step 4: Remove any duplicate rows
df.drop_duplicates()
Out[215]:
Submission ID What is your age? How many cups of coffee do you typically drink per day? Where do you typically drink coffee? Where do you typically drink coffee? (At home) Where do you typically drink coffee? (At the office) Where do you typically drink coffee? (On the go) Where do you typically drink coffee? (At a cafe) Where do you typically drink coffee? (None of these) How do you brew coffee at home? ... What is the most you'd ever be willing to pay for a cup of coffee? Do you feel like you’re getting good value for your money when you buy coffee at a cafe? Approximately how much have you spent on coffee equipment in the past 5 years? Do you feel like you’re getting good value for your money with regards to your coffee equipment? Gender Education Level Ethnicity/Race Employment Status Number of Children Political Affiliation
15 Zd694B <18 years old 3 At home, At the office, At a cafe True True False True False Pour over, Espresso, Instant coffee ... NaN NaN NaN NaN Other Bachelor's degree Other Employed full-time More than 3 Democrat
17 QA5JYA 25-34 years old 1 At home, At the office, On the go True True True False False Pour over, Coffee brewing machine (e.g. Mr. Co... ... NaN NaN NaN NaN Female Bachelor's degree White/Caucasian Employed full-time NaN Democrat
34 ylqbBg 45-54 years old 2 At home, At the office, At a cafe, On the go True True True True False Pour over, French press, Espresso ... $8-$10 No $500-$1000 Yes Male Master's degree Other Employed full-time 2 No affiliation
39 BGboZR 18-24 years old 3 At home, At a cafe, On the go True False True True False Pour over, French press, Espresso, Other, Cold... ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
41 YZzBdN 25-34 years old 2 At home, At the office True True False False False Pour over, Espresso ... More than $20 Yes $50-$100 Yes Male Master's degree Asian/Pacific Islander Unemployed NaN Independent
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4037 PA44VP >65 years old 2 At home True False False False False Coffee brewing machine (e.g. Mr. Coffee) ... $4-$6 No Less than $20 Yes Female Master's degree White/Caucasian Retired 2 Democrat
4038 vNgpPD >65 years old 2 At home True False False False False Coffee brewing machine (e.g. Mr. Coffee) ... $2-$4 No Less than $20 Yes Male Bachelor's degree White/Caucasian Retired 2 Republican
4039 g5ggRM 18-24 years old 1 At a cafe, At home, On the go, At the office True True True True False Espresso, Pod/capsule machine (e.g. Keurig/Nes... ... More than $20 Yes $300-$500 Yes Male Some college or associate's degree White/Caucasian Employed full-time NaN Democrat
4040 rlgbDN 25-34 years old 2 At home True False False False False Pour over ... $8-$10 Yes $100-$300 Yes Male Bachelor's degree White/Caucasian Unemployed NaN Democrat
4041 0EGYe9 25-34 years old 1 At home True False False False False Pour over, French press, Espresso, Other ... $15-$20 Yes $500-$1000 Yes Female Doctorate or professional degree White/Caucasian Employed full-time 1 Democrat

3650 rows × 111 columns

Renaming Column Headers¶

In [11]:
# Sample renaming of specific columns
df.rename(columns ={
    'How do you brew coffee at home? (Pour over)':'PourOver',
    'How do you brew coffee at home? (French press)':'FrenchPress',
    'How do you brew coffee at home? (Espresso)':'Espresso',
    'How do you brew coffee at home? (Coffee brewing machine (e.g. Mr. Coffee))':'CoffeeBrewingMachine',
    'How do you brew coffee at home? (Pod/capsule machine (e.g. Keurig/Nespresso))':'CapsuleMachine',
    'How do you brew coffee at home? (Instant coffee)':'InstantCoffee',
    'How do you brew coffee at home? (Bean-to-cup machine)':'BeanToCupMachine',
    'How do you brew coffee at home? (Cold brew)':'ColdBrew',
    'How do you brew coffee at home? (Coffee extract (e.g. Cometeer))':'CoffeeExtract',
    'How do you brew coffee at home? (Other)':'OtherMachine'},inplace =True)
In [13]:
# Sample renaming of specific columns
df.rename(columns ={'On the go, where do you typically purchase coffee? (National chain (e.g. Starbucks, Dunkin))':'NationalCahin',
 'On the go, where do you typically purchase coffee? (Local cafe)':'LocalCafe',
 'On the go, where do you typically purchase coffee? (Drive-thru)':'DeiveThru',
 'On the go, where do you typically purchase coffee? (Specialty coffee shop)':'SpecialtyCoffeeShop',
 'On the go, where do you typically purchase coffee? (Deli or supermarket)':'SuperMarket',
 'On the go, where do you typically purchase coffee? (Other)':'OtherLocation'},inplace =True)
In [15]:
# Sample renaming of specific columns
df.rename(columns ={
 'Where do you typically drink coffee? (At home)':'AtHome',
 'Where do you typically drink coffee? (At the office)':'AtOffice',
 'Where do you typically drink coffee? (On the go)':'OnTheGo',
 'Where do you typically drink coffee? (At a cafe)':'AtCafe',
 'Where do you typically drink coffee? (None of these)':'NoneOfThese'},inplace =True)
In [17]:
# Sample renaming of specific columns
df.rename(columns ={
 'What kind of sugar or sweetener do you add? (Artificial Sweeteners (e.g., Splenda))':'Artificial Sweeteners',
 'What kind of sugar or sweetener do you add? (Granulated Sugar)':'GranulatedSugar',
 'What kind of sugar or sweetener do you add? (Honey)':'Honey',
 'What kind of sugar or sweetener do you add? (Maple Syrup)':'MapleSyrup',
 'What kind of sugar or sweetener do you add? (Stevia)':'Stevia',
 'What kind of sugar or sweetener do you add? (Agave Nectar)':'AgaveNectar',
 'What kind of sugar or sweetener do you add? (Brown Sugar)':'BrownSugar',
 'What kind of sugar or sweetener do you add? (Raw Sugar (Turbinado))':'RawSugar'},inplace =True)
In [19]:
# Sample renaming of specific columns
df.rename(columns ={'What kind of dairy do you add? (Whole milk)':'WholeMilk',
 'What kind of dairy do you add? (Skim milk)':'SkimMilk',
 'What kind of dairy do you add? (Half and half)':'HalfAndHalf',
 'What kind of dairy do you add? (Coffee creamer)':'CoffeeCreamer',
 'What kind of dairy do you add? (Flavored coffee creamer)':'FalavoredCoffeeCreamer',
 'What kind of dairy do you add? (Oat milk)':'OatMilk',
 'What kind of dairy do you add? (Almond milk)':'AlmondMilk',
 'What kind of dairy do you add? (Soy milk)':'SoyMilk',
 'What kind of dairy do you add? (Other)':'OtherMilk'},inplace =True)

Merging Sub-Columns¶

In [21]:
brewing_columns = [
 'PourOver',
 'FrenchPress',
 'Espresso',
 'CoffeeBrewingMachine',
 'CapsuleMachine',
 'InstantCoffee',
 'BeanToCupMachine',
 'ColdBrew',
 'CoffeeExtract',
 'OtherMachine',
   
]
# Convert all brewing method columns to boolean (True for non-zero/non-empty, False for NaN or 0)
df[brewing_columns] = df[brewing_columns].notna() & df[brewing_columns].astype(bool)

# Create the main column with the names of the brewing methods used
df['How do you brew coffee at home?'] = df[brewing_columns].apply(
    lambda row: ', '.join([col for col, val in row.items() if val]), axis=1
)
In [23]:
location =[
 'AtHome',
 'AtOffice',
 'OnTheGo',
 'AtCafe',
 'NoneOfThese']

# Convert all brewing method columns to boolean (True for non-zero/non-empty, False for NaN or 0)
df[location] = df[location].notna() & df[location].astype(bool)

# Create the main column with the names of the brewing methods used
df['Where do you typically drink coffee?'] = df[location].apply(
    lambda row: ', '.join([col for col, val in row.items() if val]), axis=1
)
In [25]:
buylocation =[
'NationalCahin',
'LocalCafe',
'DeiveThru',
'SpecialtyCoffeeShop',
'SuperMarket',
'OtherLocation']

# Convert all brewing method columns to boolean (True for non-zero/non-empty, False for NaN or 0)
df[buylocation] = df[buylocation].notna() & df[buylocation].astype(bool)
 
# Create the main column with the names of the brewing methods used
df['On the go, where do you typically purchase coffee?'] = df[buylocation].apply(
    lambda row: ', '.join([col for col, val in row.items() if val]), axis=1)
In [27]:
#Replacing categorical values into numerical values
df['What is your age?'] = df['What is your age?'].str.replace('years old','')

visualization Data Insights¶

In [29]:
# Replace 'NULL' values with 'Unknown'
df['What is your age?']==df['What is your age?'].replace('NULL', '')
df['What is your age?']==df['What is your age?'].replace('years old', ' ')
df['What is your age?'] = df['What is your age?'].str.replace('years old','')

colors = ['goldenrod', 'lightblue', 'thistle', 'olivedrab', 'coral', 'mediumseagreen', 'slateblue']

# Count the occurrences using Seaborn's countplot
plt.figure(figsize=(8, 4))
ax = sns.countplot(data=df, x='What is your age?',hue='What is your age?',palette = colors)

# Adding bar labels on top of the bars
for bars in ax.containers:
    ax.bar_label(bars)

plt.title("Age Group Distribution")
plt.xlabel("Age Group")
plt.ylabel("Frequency")
plt.show()
No description has been provided for this image
In [31]:
# Create a horizontal bar chart
plt.figure(figsize=(10, 5))
ax = sns.countplot(y='Gender', data=df, hue='Gender', palette='viridis')

# Add bar labels
for bars in ax.containers:
    ax.bar_label(bars)

# Display the plot
plt.show()
No description has been provided for this image
In [131]:
gender_counts = df[df['Gender'].isin(['Male', 'Female'])]['Gender'].value_counts()

# Plotting the pie chart
plt.figure(figsize=(4, 4))
plt.pie(gender_counts, labels=gender_counts.index, autopct='%1.1f%%', startangle=140, colors=["SkyBlue", "Coral"])
plt.title("Gender Distribution in Dataset")
plt.show()
No description has been provided for this image
In [80]:
# Total Amount spend by gender 
df.groupby('Gender')['AveragePrice'].sum()
Out[80]:
Gender
Female                7180.5
Male                 23529.0
Non-binary            1047.0
Other                   85.0
Prefer not to say      281.5
Name: AveragePrice, dtype: float64
In [35]:
# Define colors for the plot
colors = ["SkyBlue", "Coral", "Goldenrod", "SeaGreen", "SlateGray"]

# Filter to include only rows where "Where do you typically drink coffee?" is not NaN
coffee_consumers_df = df[df['Where do you typically drink coffee?'].notna()]
# Plot count of coffee consumers by Employment Status and Gender
plt.figure(figsize=(14, 4))
ax = sns.countplot(data=coffee_consumers_df, x='Employment Status', hue='Gender', palette=colors)
# Adding labels on top of each bar
for bars in ax.containers:
    ax.bar_label(bars)
# Display the plot
plt.title("Employment Status of Coffee Consumers by Gender")
plt.xlabel("Employment Status")
plt.ylabel("Count of Coffee Consumers")
plt.show()
No description has been provided for this image
In [37]:
# get the average price by removing unwanted characters '$' and by dividing lower_bound and upper_bound by '/2'
def parse_price(value):
    if pd.isnull(value):
        return None
    elif '-' in value:  # If the value is a range like "$4-$6"
        low, high = value.split('-')
        return (float(low.replace('$', '').strip()) + float(high.replace('$', '').strip())) / 2
    elif "More than" in value:  # If the value is "More than 20"
        return 20.0  # or choose a higher value like 25
    else:
        try:
            return float(value.replace('$', '').strip())  # Handle single values without ranges
        except ValueError:
            return None  # Default to None if there's an unexpected format

df["AveragePrice"] = df["What is the most you've ever paid for a cup of coffee?"].apply(parse_price)
In [39]:
# Creating subplots of histograms of showing various categories
fig = make_subplots(
    rows=4, cols=2,
    subplot_titles=("Gender", "What is your age?", "Education Level", "Ethnicity/Race",
                    "Employment Status", "Number of Children", "Political Affiliation", "AveragePrice")
)

# Add histograms for each subplot
fig.add_trace(go.Histogram(x=df['Gender']), row=1, col=1)
fig.add_trace(go.Histogram(x=df['What is your age?']), row=1, col=2)
fig.add_trace(go.Histogram(x=df['Education Level']), row=2, col=1)
fig.add_trace(go.Histogram(x=df['Ethnicity/Race']), row=2, col=2)
fig.add_trace(go.Histogram(x=df['Employment Status']), row=3, col=1)
fig.add_trace(go.Histogram(x=df['Number of Children']), row=3, col=2)
fig.add_trace(go.Histogram(x=df['Political Affiliation']), row=4, col=1)
fig.add_trace(go.Histogram(x=df['AveragePrice']), row=4, col=2)

# Update layout if needed
fig.update_layout(height=1200, width=1000, title_text="Count Plots")
fig.update_layout(showlegend=False)  # Hide the legend if not needed

# Show the figure
fig.show()
In [41]:
#Split the "On the go, where do you typically purchase coffee?" column and explode it

#Split the "On the go, where do you typically purchase coffee?" column and explode it
df_expanded = df.assign(
    Purchase_Location=df['On the go, where do you typically purchase coffee?'].str.split(', ')
).explode('Purchase_Location')

#Replace values in the Gender column
df_expanded['Gender'] = df_expanded['Gender'].str.replace('Other', 'Unknown')

#Remove empty entries after exploding
df_expanded = df_expanded[df_expanded['Purchase_Location'] != ""]

#Count occurrences of each location by gender
location_gender_counts = df_expanded.groupby(['Purchase_Location', 'Gender']).size().unstack(fill_value=0)

#Plotting the stacked bar chart
location_gender_counts.plot(kind="bar", stacked=True, figsize=(10, 5), color=['ivory', 'tan', 'olivedrab', 'lightblue'])
plt.title("On-the-Go Coffee Purchase Locations by Gender")
plt.xlabel("Purchase Location")
plt.ylabel("Count of Responses")
plt.legend(title="Gender", bbox_to_anchor=(1.05, 1), loc='upper left')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
No description has been provided for this image
In [43]:
# Plotting a pie chart to show the proportion of spending categories
# Grouping the data into spending categories for visualization
spending_categories = df['Approximately how much have you spent on coffee equipment in the past 5 years?'].value_counts(dropna=False)

# Plotting the pie chart
plt.figure(figsize=(6, 6))
plt.pie(spending_categories, labels=spending_categories.index, autopct='%1.1f%%', startangle=140, colors=plt.cm.Paired.colors)
plt.title("Proportion of Spending on Coffee Equipment Over 5 Years")
plt.show()
No description has been provided for this image
In [45]:
# Define the columns for which you want to plot value counts
columns = ['How do you brew coffee at home?', 'What kind of dairy do you add?', 'What kind of sugar or sweetener do you add?', 'What is your favorite coffee drink?']

# Set up a 2x2 grid for subplots
fig, axes = plt.subplots(2, 2, figsize=(14, 10))  # 2 rows, 2 columns
axes = axes.flatten()  # Flatten to easily access each subplot

# Loop through each column, split, explode, and create bar charts
for i, column in enumerate(columns):
    #Split the comma-separated values and explode, then drop null or empty values
    df_expanded = df[column].str.split(', ').explode().dropna()
    df_expanded = df_expanded[df_expanded != '']  # Remove empty strings
    
    #Get the top 3 most common values
    top_3_values = df_expanded.value_counts().nlargest(3)
    
    #Plot the bar chart for the top 3 values
    ax = axes[i]
    ax.bar(top_3_values.index, top_3_values.values,color=['indigo', 'khaki', 'lavender'])  # Create vertical bar chart
    ax.set_title(f'Top 3 Most Frequent {column}')
    ax.set_xlabel(column)
    ax.set_ylabel('Count')
    ax.tick_params(axis='x', rotation=45)  # Rotate x-axis labels for readability

# Adjust layout and spacing
plt.tight_layout(pad=3.0)  # Add padding to reduce overlap between subplots
plt.show()
No description has been provided for this image

Final Insights¶

After analyzing the data, we have gathered key insights about customer coffee spending patterns based on age, gender, education status,ethincity race.¶

Actionable Insights¶

• For Age feature, we observed that ~ 1844 of the customer's who belong to the age group 25-34 (~ 882: 35-44,~398: 18-24, ~163: 55-64) tend to spend the most.

• For Gender feature, ~75% of the number of purchases are made by Male customer's and rest of the 25% is done by female customer's. This tells us the Male consumers are the major contributors to the number of sales for the Coffee Sales.On average the male gender spends more money on purchase contrary to female, and it is possible to also observe this trend by adding the total value of purchase.18

•Average amount spent by Male customers: 23529.0
•Average amount spent by Female customers: 7180.5

• When we combined Purchase and EducationalStatus for analysis (2050 are Males and 563 females are contributed from the Employed Category. We came to know that Males spend the most during the Employed Face. It also tells that Men tend to spend less once they are HomeMaker. It maybe because of the added responsibilities.

Recomendations¶

  1. Men spend more money on coffee than women. The company should focus on promotions and offers targeted at female customers to attract more female customers and increase their spending.
  2. Customers in the age group of 25-34 spend more money than other age groups. The company should focus on acquiring customers from other age groups to broaden its customer base.
  3. Customers mostly prefer buying coffee from Specialty Coffee Shops, with local shops being the least popular choice.
  4. Customers primarily come from educational backgrounds, such as Bachelor's or Master's degree holders, and the majority are from White ethnic backgrounds.
  5. Over 21% of customers with more than 5 years of coffee consumption spend over $1000 annually. Most customers prefer whole milk and granulated sugar in their coffee.